Approximate Representation of Textual Documents in the Concept Space
نویسندگان
چکیده
In this paper we deal with the problem of addition of new documents in collection when documents are represented in lower dimensional space by concept indexing. Concept indexing (CI) is a method of feature construction that is relying on concept decomposition of term-document matrix. By using CI original representations of documents are projected on the space spread by centroids of clusters, which are called concept vectors. This problem is especially interesting for application on World Wide Web. Proposed methods are tested for the task of information retrieval. Vectors on which the projection is done in the process of dimension reduction are constructed on the basis of representations of all documents in the collection, and computation of the new representations in the space of reduced dimension demands recomputation of concept decomposition. The solution to this problem is the development of methods which will give approximate representation of newly added documents in the space of reduced dimension. In the paper are introduced two methods for addition of new documents in the space of reduced dimension. In the first method there no addition of new index terms and added documents are represented by existing list of
منابع مشابه
The Effect of Visual Representation, Textual Representation, and Glossing on Second Language Vocabulary Learning
In this study, the researcher chose three different vocabulary techniques (Visual Representation, Textual Enhancement, and Glossing) and compared them with traditional method of teaching vocabulary. 80 advanced EFL Learners were assigned as four intact groups (three experimental and one control group) through using a proficiency test and a vocabulary test as a pre-test. In the visual group, stu...
متن کاملAlgorithm for Classification of Textual Documents Represented by Tandem Analysis
In this research is presented algorithm for classification of textual documents which are represented in the space of reduced dimension in respect to original bag of words representation. Algorithm is carried out in two steps: in the first step classification is conducted for documents represented in original bag of words representation, while in the second step classification is conducted for ...
متن کاملخوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملA Note on Solving Prandtl's Integro-Differential Equation
A simple method for solving Prandtl's integro-differential equation is proposed based on a new reproducing kernel space. Using a transformation and modifying the traditional reproducing kernel method, the singular term is removed and the analytical representation of the exact solution is obtained in the form of series in the new reproducing kernel space. Compared with known investigations, its ...
متن کاملSpace as a Semiotic Object: A Three-Dimensional Model of Vertical Structure of Space in Calvino’s Invisible Cities
Following the “spatial turn” of the last 3 decades in humanities and social sciences and the structure of semiotic object, this research studies space as the main semiotic object of Calvino’s (1972) Invisible Cities. Significance of this application resides in examining the possibility of providing a more concrete methodology based on the integration of Zoran’s (1984) 3 vertical levels of const...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Informatica (Slovenia)
دوره 31 شماره
صفحات -
تاریخ انتشار 2007